Open Science and
Research Software Engineering
Workshop
Center for Advanced Internet Studies (CAIS)
Quirin Würschinger
LMU Munich
September 21, 2023
> whoami
Quirin Würschinger
q.wuerschinger@lmu.de
Wissenschaftlicher Mitarbeiter and PostDoc in (computational) linguistics
LMU Munich
Current work
research
lexical innovation on the web and in social networks
variation and change in language use and social polarization in social networks
using Large Language Models (LLMs) like ChatGPT for research in linguistics and social science.
teaching : corpus linguistics and research methodology
Promoting Open Science in (computational) linguistics at LMU
teaching and applying reproducible corpuslinguistic methods
creating and sharing corpora among researchers and students
Open Open Science workshop
Focus on …
ask questions
discuss
apply and practice
collaborate
Time table
Intro
09:00
09:30
Open Science principles
09:30
10:30
—
10:30
10:50
version control
10:50
11:10
project structure
11:10
12:00
data
12:00
12:30
—
12:30
13:30
code
13:30
14:00
methods
14:00
14:30
authoring
14:30
15:15
—
15:15
15:30
publishing
15:30
16:00
open issues and recap
16:00
16:30
Addressing different backgrounds and goals
Backgrounds and interests
CAIS: Forschung zu Digitalisierung und Digitale Gesellschaft
research fiels
education and pedagogy
political science
sociology
communications studies
…
data and methods
qualitative interviews
text analysis
quantitative surveys
experimental designs
social media studies
…
Survey: main interests
reproducible workflows
managing files and folders
plain text authoring
programming with Python and R
methods
quantitative approaches
text analysis
questionnaires
publishing
authoring papers
sharing data and code
Who are you?
Please briefly introduce yourself …
name
place and position
your research interest in about 3 sentences for someone outside your field
What is Open Science?
Why should we do Open Science?
source
dataset/sample size
effect sizes
selection/number of relationships
flexibility in design
financial interests
hype around topic/field
What are the reasons why science can go wrong?
Roles in Open Science
Funders
make open science part of the selection process, and conditions for grantees conducting research.
Publishers
make open science part of the review process, and conditions for articles published in their journals.
Institutions
make open science part of academic training, and part of the selection process for research positions and evaluation for advancement and promotion.
Societies
make open science part of their awards, events, and scholarly norms.
Researchers
enact open science in their work and advocate for broader adoption in their communities.
[Center for Open Science ]
Who profits from Open Science?
source
What is Open Science to you?
What do you find interesting, important, or attractive about Open Science?
https://tinyurl.com/opnsci
Learning outcomes
Implementing an open and reproducible workflow
version control
project structure
data
methods
code
authoring
publishing
git and GitHub/GitLab
git
software on your machine
git add src/tests.py
git commit -m 'add tests'
git push
GitHub and GitLab
services on a remote server
How to set up a GitHub repository
set up git
Installing git: see tutorial
Using git:
from the command line
using a standalone GUI1 tool; e.g.:
from within your editor/IDE2 ; e.g.:
set up GitHub
tutorial
setting up git user information (name, passwort)
setting up GitHub authentication
setting and storing authentication (‘token’)
create a repository on GitHub
(create GitHub account)
click on New (https://github.com/new )
specify repo name 1
specify description
specify visibility: private or public
select Add a README file
specify licence 2
clone repositories
go to the folder where you want your project to live
git clone https://github.com/wuqui/opensciws.git
adding, commiting, and pushing changes
(source)
git add src/tests.py
git commit -m 'add tests'
git push
Let’s not pretend we’re all geniuses …
File names
File names should be:
machine-readable
human-readable
consistent
optional: play well with default ordering (e.g. include timestamps)
File structure
.
├── analysis <- all things data analysis
│ └── src <- functions and other source files
├── comm
│ ├── internal-comm <- internal communication such as meeting notes
│ └── journal-comm <- communication with the journal, e.g. peer review
├── data
│ ├── data_clean <- clean version of the data
│ └── data_raw <- raw data (don't touch)
├── dissemination
│ ├── manuscripts
│ ├── posters
│ └── presentations
├── documentation <- documentation, e.g. data management plan
└── misc <- miscellaneous files that don't fit elsewhere
Practice: project management
You have until 11:50 h to work on either …
developing a project structure for your needs from scratch
refactoring/cleaning an existing project1
Optionally: set up version control via git/GitHub for this project.
notebooks and literate programming
Types of data
interviews
questionnaires
web
social media
Authoring
Authoring
How can we organise our project from the beginning so that we can publish outputs in the end?
Publishing
Where can I publish my work (platforms, research centers infrastructure, …)?
Quarto
single source → multiple output formats
PDF for publication outlets
blog
website
Publishing
How: How can we organise our project from the beginning so that we can publish outputs in the end? Where: Where can I publish my work (platforms, research centers infrastructure, …)?
Outlets
ArXiV
preprints
Zenodo
all kinds including data, code, preprints, etc.
GitHub and GitLab
code, software
Open Science Framework
all kinds including data, code, preprints, preregistration, etc.
Software Heritage
archival of code (long-term)
Papers with Code
code and data for and with papers, mostly Machine Learning
…
Resources
DRA
The Turing Way
Data Carpentries